Projet EU DEMERSTEM http://pescao-demerstem.org/ Projet EU DEMERSTEM http://pescao-demerstem.org/

Introduction

Le package R GPSMonitoring, developpé dans le cadre du projet UE/CEDEAO DEMERSTEM a été initié pour faciliter la gestion des données GPS dans le cadre d’un suivi des pêcheries artisanales. Il propose des fonctions de filtre (GPS.curation) qui vont permettre de selectionner uniquement les positions en mer pour ensuite les qualifier en pêche ou non pêche.

Des fonctions du packages vont en outre permettre :

Pour installer ce packages, veuillez utiliser la commande suivante : devtools::install_github(‘polehalieutique/GPSMonitoring’)

Charger les fichiers GPX

Les fichiers GPX sont des fichiers XML utilisés par de nombreux GPS du marchés.

La fonction GPX.load va vous permettre de charger un fichier GPX ou l’ensemble des fichiers GPX d’un répertoire donné. Le package lui même est fourni avec un jeu de données (DEMERSTEM_GPS.Rdata) qui contient un echantillon d’environ 600 000 positions GPS collectés par le Centre National des Sciences Halieutiques De Boussoura (Guinée) dans le cadre du projet DEMERSTEM. Le jeu de données concerne des pirogues suivies au port artisanal de Kamsar Port Néné (KPN). Le jeux de données correspond à un data paper (DEMERSTEM project : dx.doi.org)

Le jeux de données collecté par les collègues Guinéens (Merci à Mohamed Soumah) permettra d’illustrer les fontionnalités du package dans ce document. Lors du chargement du fichier DEMERSTEM_gps.Rdata, inclus dans le package, une dataframe GPSdataset sera disponible.

data(GPSdataset)

head(GPSdataset) %>%  kable()
code_village code_engin code_pecheur date_heure longitude latitude
KPN PA 0006 2019-11-22 10:34:18 -14.9939 10.4980
KPN PA 0006 2019-11-22 10:31:17 -14.9983 10.4978
KPN PA 0006 2019-11-22 07:57:51 -15.0129 10.4932
KPN PA 0006 2019-11-21 17:21:08 -15.0262 10.4830
KPN PA 0006 2019-11-21 15:28:03 -15.0324 10.4176
KPN PA 0006 2019-11-21 12:57:27 -15.0383 10.4027
GPSdataset<-GPSdataset %>% filter(date_heure<"2019-09-30 20:20:53 CEST")

Le jeu de données étant un peu large pour la demonstration je vais le réduire à une plage de temps plus faible qui se terminera le 30 septembre 2019.

Ce jeux de données dataframe va être transformé en objet SF. Nous ajoutons aussi le nom du fichier GPX d’origine. Celui-ci, dans le protocole de collecte de données, renseigne le village, l’engin et le pêcheur suivi. Une fois ces manipulations effectuée, on peut regarder les données en utilisant la fonction geom_sf.

GPSdataset %>% mutate(filename=paste(code_village,code_engin,code_pecheur,'.gpx',sep='_')) %>% 
arrange (filename) %>% dplyr::distinct(code_village,code_engin,code_pecheur,filename) %>% 
dplyr::mutate(track_fid=row_number(),track_seg_id=track_fid) %>% 
inner_join(GPSdataset) %>%  
group_by(filename) %>% arrange (filename,date_heure) %>% dplyr::mutate(track_seg_point_id = row_number()) %>% 
  dplyr::rename(time=date_heure) %>%
  st_as_sf(coords = c("longitude", "latitude"), crs = 4326,remove=FALSE)->gps.all
## Joining, by = c("code_village", "code_engin", "code_pecheur")
ggplot(gps.all)+geom_sf(aes(color=filename),size=0.2)

Une carte du trait de côte en lien avec les données est aussi mise à la disposition au travers du package (data/fond.Rdata)

data(fond)

ggplot(gps.all)+geom_sf(data=fond)+geom_sf(aes(color=filename),size=0.2)

Préparation des données

La préparation des données peut être décomposée en trois étapes : * le filtrage des données en mer * le découpage en sorties de pêche * la standardisation a posteriori des fréquences d’acquisition des données

Définition de la zone d’intérêt

Une fois les données chargées nous proposons de filter les données: * Dans un premier temps on défini la zone d’intérêt (emprise) et nous graderons toutes les positions comprises dans cette zone (pol.extent). * Dans un second temps, on peut eventuellement supprimer les positions incluses dans une zone d’exclusion. Typiquement, on peut retirer la zone du port en considérant un perimètre d’exclusion ou les activités sont trop proches du port pour être distinguées. Dans ce second cas on veut exclure les données incluses dans ce périmètre.

emprise<-matrix(c(-17,11,-14,11,-14,9,-15,9,-17,11),ncol=2, byrow=TRUE)
pol.extent <-st_as_sf(st_sfc(st_polygon(list(emprise))))
st_crs(pol.extent) = 4326

ports<-data.frame(code=c('KPN'),long=c(-14.61),lat=c(10.6789))
ports.sf<- st_as_sf(ports,coords=c("long","lat"),crs=4326)
exclude<-st_as_sf(st_union(st_buffer(ports.sf,1000)))

ggplot(exclude)+geom_sf(data=pol.extent)+geom_sf(fill='red')

gps.all.cur<-GPS.curation(gps.all,extent=pol.extent,exclude=exclude)
## [1] "input number of position :271600"
## [1] "output number of position :269378"
g1<-ggplot(gps.all)+geom_sf()+geom_sf(data=exclude,fill=rgb(0.8,0.11,0.1,0.5))+
  geom_sf(data=pol.extent,fill=rgb(0.1,0.8,0.1,0.5))

g2<-ggplot(gps.all.cur)+geom_sf()+geom_sf(data=exclude,fill=rgb(0.8,0.11,0.1,0.5))+
  geom_sf(data=pol.extent,fill=rgb(0.1,0.8,0.1,0.5))

ggarrange(g1,g2)

Une autre façon de définir la zone d’intérêt est d’utiliser la fonction create.extent qui va afficher une image satellite de la zone ainsi qu’un polygone construit avec la zone convexe créé à partir de l’ensemble des points de mon jeu de données.

Cet outil permettra de créé facilement un polygon (objet SF) qui définira la zone d’étude aussi précisément que vous le désirez (grace au zoom notamment pour éviter les zones portuaires.

pol.extent<-create.extent(st_convex_hull(st_union(gps.all)))

La video suivante est une démonstration de l’utilisation de cette fonction :

Une fois la zone créée, comme tout objet sf, vous pouvez la sauvegarder en shapefile (st_write) ou en Rdata (save(pol.extent,“monfichier.Rdata”))

data(emprise)


ggplot(emprise)+geom_sf()+
  ggtitle("More precise pol.extent created using create.extent function")

pol.extent<-emprise

gps.all.cur<-GPS.curation(gps.all,extent=pol.extent,exclude=exclude)
## [1] "input number of position :271600"
## [1] "output number of position :267046"
g1<-ggplot(gps.all)+geom_sf()+geom_sf(data=exclude,fill=rgb(0.8,0.11,0.1,0.5))+
  geom_sf(data=pol.extent,fill=rgb(0.1,0.8,0.1,0.5))

g2<-ggplot(gps.all.cur)+geom_sf()+geom_sf(data=exclude,fill=rgb(0.8,0.11,0.1,0.5))+
  geom_sf(data=pol.extent,fill=rgb(0.1,0.8,0.1,0.5))

ggarrange(g1,g2)

Découper une trajectoire en sorties de pêche

We considered that if there is more than n (minutes) between 2 position, we are starting a new traject, a new fihing trip.

The id of a GPS record is the filemename

limit<-600*120 #2 hours between two point and we consider a new traject
 #limit<-240
head(gps.all.cur)
## Simple feature collection with 6 features and 12 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -14.6416 ymin: 10.6178 xmax: -14.621 ymax: 10.6641
## Geodetic CRS:  WGS 84
## # A tibble: 6 × 13
##   code_village code_en…¹ code_…² filen…³ track…⁴ track…⁵ time                longi…⁶ latit…⁷ track…⁸           geometry
##   <chr>        <chr>     <chr>   <chr>     <int>   <int> <dttm>                <dbl>   <dbl>   <int>        <POINT [°]>
## 1 KPN          FMCl      0003    KPN_FM…       1       1 2019-04-14 10:02:18   -14.6    10.7       3  (-14.621 10.6641)
## 2 KPN          FMCl      0003    KPN_FM…       1       1 2019-04-14 10:07:18   -14.6    10.7       4 (-14.6251 10.6546)
## 3 KPN          FMCl      0003    KPN_FM…       1       1 2019-04-14 10:12:18   -14.6    10.6       5   (-14.63 10.6457)
## 4 KPN          FMCl      0003    KPN_FM…       1       1 2019-04-14 10:17:18   -14.6    10.6       6 (-14.6341 10.6366)
## 5 KPN          FMCl      0003    KPN_FM…       1       1 2019-04-14 10:22:18   -14.6    10.6       7  (-14.6376 10.627)
## 6 KPN          FMCl      0003    KPN_FM…       1       1 2019-04-14 10:27:18   -14.6    10.6       8 (-14.6416 10.6178)
## # … with 2 more variables: `_leaflet_id` <dbl>, feature_type <chr>, and abbreviated variable names ¹​code_engin,
## #   ²​code_pecheur, ³​filename, ⁴​track_fid, ⁵​track_seg_id, ⁶​longitude, ⁷​latitude, ⁸​track_seg_point_id
GPS.data<-gps.all.cur
gps.all.cur_traj<-GPS.add_traj_number(gps.all.cur,limit)  
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
gps.all.cur_traj %>% mutate(x = sf::st_coordinates(.)[,1],
                y = sf::st_coordinates(.)[,2])->gps.all.cur_traj


unique(gps.all.cur_traj$track_fid)
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  18  19  20  21  22  23  25  26  27  28  29  30  31  32
##  [31]  33  34  35  36  37  39  40  41  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63  64
##  [61]  65  66  67  68  69  70  71  72  73  74  76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95
##  [91]  97  99 100 101 102 103 104 105 106 107 108 109 110 111 113 114 115 116 117 118 119 120 122 123 124 126 127 128 129 130
## [121] 131 132 133 134 135 136 137 139 140 141 142 143 144 145
unique(gps.all.cur_traj[gps.all.cur_traj$track_fid==2,]$no_trajet)
## [1] 2
head(gps.all.cur_traj)
## Simple feature collection with 6 features and 16 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -14.6465 ymin: 10.6091 xmax: -14.6251 ymax: 10.6546
## Geodetic CRS:  WGS 84
## # A tibble: 6 × 17
##   code_village code_en…¹ code_…² filen…³ track…⁴ track…⁵ time                longi…⁶ latit…⁷ track…⁸           geometry
##   <chr>        <chr>     <chr>   <chr>     <dbl>   <int> <dttm>                <dbl>   <dbl>   <int>        <POINT [°]>
## 1 KPN          FMCl      0003    KPN_FM…       1       1 2019-04-14 10:07:18   -14.6    10.7       4 (-14.6251 10.6546)
## 2 KPN          FMCl      0003    KPN_FM…       1       1 2019-04-14 10:12:18   -14.6    10.6       5   (-14.63 10.6457)
## 3 KPN          FMCl      0003    KPN_FM…       1       1 2019-04-14 10:17:18   -14.6    10.6       6 (-14.6341 10.6366)
## 4 KPN          FMCl      0003    KPN_FM…       1       1 2019-04-14 10:22:18   -14.6    10.6       7  (-14.6376 10.627)
## 5 KPN          FMCl      0003    KPN_FM…       1       1 2019-04-14 10:27:18   -14.6    10.6       8 (-14.6416 10.6178)
## 6 KPN          FMCl      0003    KPN_FM…       1       1 2019-04-14 10:32:18   -14.6    10.6       9 (-14.6465 10.6091)
## # … with 6 more variables: `_leaflet_id` <dbl>, feature_type <chr>, no_trajet <dbl>, duree <dbl>, x <dbl>, y <dbl>, and
## #   abbreviated variable names ¹​code_engin, ²​code_pecheur, ³​filename, ⁴​track_fid, ⁵​track_seg_id, ⁶​longitude, ⁷​latitude,
## #   ⁸​track_seg_point_id

Standardize duration between 2 position

As we can see here, the ping frequency is not constant. So we decide to recalibrate all the trajectory using a constant frequency of dt=300 s (5 minutes). The package ADEhabitat is used.

ggplot(filter(R.gps.all.cur_traj,duree<350))+geom_histogram(aes(x=duree))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

projcrs <- "+proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0"
R.gps.all.cur_traj <- st_as_sf(x = R.gps.all.cur_traj,
               coords = c("x", "y"),
               crs = projcrs)



ggplot(R.gps.all.cur_traj)+geom_sf(aes(color=no_trajet))+ggtitle(paste("Tracks redistribute in a ",step_dt," period"))

## Join Observation data

Once the GPS pings are ready, we have now to predict fishing event. First we have to observe fishing events and to flag them on GPS positions.

The first set of observed data could be provided using on board observers. The datset of observed fishing event will be composed of start time and end time of fishing event by observe fishermen.

data(Observed_FO)

head(Observed_FO) %>% kable()
X code_village code_engin code_pecheur start_op end_op fishing
1 KPN FMCl 3 2019-04-14 12:37:18 2019-04-14 16:27:18 active
2 KPN FMCl 3 2019-04-14 17:47:18 2019-04-15 09:06:18 active
3 KPN FMCl 3 2019-04-15 18:02:18 2019-04-15 22:03:18 active
4 KPN FMCl 3 2019-04-15 22:52:18 2019-04-16 05:35:18 active
5 KPN FMCl 3 2019-04-16 17:45:18 2019-04-17 02:00:18 active
6 KPN FMCl 3 2019-05-16 17:03:10 2019-05-16 22:43:10 active

We thus need to join observed fishing event to GPS dataset using (code_village,code_engin,code_pecheur) and the date of the GPS position that could be rely to start and end fishing operation.

R.gps.all.cur_traj %>% tidyr::separate(filename,c('code_village','code_engin','code_pecheur'),remove=FALSE,sep='_') %>% 
  mutate(code_pecheur=as.numeric(code_pecheur)) %>%  mutate(longitude = sf::st_coordinates(.)[,1],
                                             latitude = sf::st_coordinates(.)[,2])->R2
## Warning: Expected 3 pieces. Additional pieces discarded in 100960 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
## 17, 18, 19, 20, ...].
#add id number for each position of a traject
R2$id<-(R2 %>% group_by(no_trajet) %>% mutate(id=row_number()) %>% select (id))$id

R2$date<-as.POSIXct(R2$date)
R2 %>% mutate(obs=as.integer(1),activity='UK')->R2
head(R2)
## Simple feature collection with 6 features and 20 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -14.6465 ymin: 10.6091 xmax: -14.6251 ymax: 10.6546
## CRS:           +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
##                      date      dx      dy        dist  dt        R2n abs.angle   rel.angle no_trajet track_fid
## 1...1 2019-04-14 10:07:18 -0.0049 -0.0089 0.010159724 300 0.00000000 -2.074071          NA         1         1
## 2...2 2019-04-14 10:12:18 -0.0041 -0.0091 0.009980982 300 0.00010322 -1.994107  0.07996368         1         1
## 3...3 2019-04-14 10:17:18 -0.0035 -0.0096 0.010218121 300 0.00040500 -1.920403  0.07370363         1         1
## 4...4 2019-04-14 10:22:18 -0.0040 -0.0092 0.010031949 300 0.00091801 -1.980924 -0.06052022         1         1
## 5...5 2019-04-14 10:27:18 -0.0049 -0.0087 0.009984989 300 0.00162649 -2.083731 -0.10280767         1         1
## 6...6 2019-04-14 10:32:18 -0.0051 -0.0083 0.009741663 300 0.00252821 -2.121779 -0.03804747         1         1
##                 filename code_village code_engin code_pecheur duree                 geometry longitude latitude id obs
## 1...1 KPN_FMCl_0003_.gpx          KPN       FMCl            3   300 POINT (-14.6251 10.6546)  -14.6251  10.6546  1   1
## 2...2 KPN_FMCl_0003_.gpx          KPN       FMCl            3   300   POINT (-14.63 10.6457)  -14.6300  10.6457  2   1
## 3...3 KPN_FMCl_0003_.gpx          KPN       FMCl            3   300 POINT (-14.6341 10.6366)  -14.6341  10.6366  3   1
## 4...4 KPN_FMCl_0003_.gpx          KPN       FMCl            3   300  POINT (-14.6376 10.627)  -14.6376  10.6270  4   1
## 5...5 KPN_FMCl_0003_.gpx          KPN       FMCl            3   300 POINT (-14.6416 10.6178)  -14.6416  10.6178  5   1
## 6...6 KPN_FMCl_0003_.gpx          KPN       FMCl            3   300 POINT (-14.6465 10.6091)  -14.6465  10.6091  6   1
##       activity
## 1...1       UK
## 2...2       UK
## 3...3       UK
## 4...4       UK
## 5...5       UK
## 6...6       UK
for (compteur in seq(1,length(Observed_FO$X)))
{  
#compteur<-1  
R2<-R2 %>% mutate(activity=case_when(
code_village==Observed_FO[compteur,]$code_village & code_engin==Observed_FO[compteur,]$code_engin  & code_pecheur==Observed_FO[compteur,]$code_pecheur & date>Observed_FO[compteur,]$start_op & date<Observed_FO[compteur,]$end_op ~ 'active',
TRUE ~ activity),obs=case_when(
code_village==Observed_FO[compteur,]$code_village & code_engin==Observed_FO[compteur,]$code_engin  & code_pecheur==Observed_FO[compteur,]$code_pecheur & date>Observed_FO[compteur,]$start_op & date<Observed_FO[compteur,]$end_op ~ Observed_FO[compteur,]$X,
TRUE ~ obs))
}
compteur<-40
liste_trajet_avec_obs<-R2 %>% st_drop_geometry()%>% filter(activity=='active')%>% dplyr::distinct(no_trajet)                           




ggplot(filter(R2,no_trajet %in% liste_trajet_avec_obs$no_trajet))+geom_sf(aes(color=as.factor(activity)),lwd=0.1)+ggtitle("Tracks redistribute in a 60s period with observation")

Modelization of fishing event using observed data

First an overview of observed data

Observed data are flaging GPS position to be active or not (ie. Fishing or not). Thus we can look at fishing or non fishing event and associated variable (speed, angle, R2n).

R2n is the the squared net displacement between the current relocation and the first relocation of the trajectory. We use distance between two point which is the same as the speed as frequency is constant.

liste_trajet_avec_obs<-R2 %>% st_drop_geometry()%>% filter(activity=='active')%>% dplyr::distinct(no_trajet)     

R2_avec_obs<-R2 %>%  filter(no_trajet %in% liste_trajet_avec_obs$no_trajet) 


R2_avec_obs %>% st_drop_geometry() %>% 
  group_by(code_engin,activity,dist) %>% dplyr::summarise(frequence=n()) %>%  ggplot()+
  geom_smooth(aes(x=dist,y=frequence,col=activity))+
  facet_wrap(~code_engin,scales = "free")+ggtitle('Fishing event ~ distance(=speed)')
## `summarise()` has grouped output by 'code_engin', 'activity'. You can override using the `.groups` argument.
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
## Warning: Removed 9 rows containing non-finite values (`stat_smooth()`).

bb<-st_bbox(R2_avec_obs)

ggplot(R2_avec_obs)+geom_sf(data=fond)+geom_sf(aes(color=activity),alpha=0.45,lwd=1)+
    scale_fill_viridis_c(option = "plasma", trans = "sqrt")+xlim(as.numeric(bb$xmin),as.numeric(bb$xmax))+ ylim(as.numeric(bb$ymin), as.numeric(bb$ymax))+facet_wrap(~code_engin)

unique(R2_avec_obs$code_engin)
## [1] "FMCl" "FMCy" "FMD"  "FMEg" "PA"   "YO"
engin_encours<-'FMCy'

head(R2_avec_obs)
## Simple feature collection with 6 features and 20 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -14.6465 ymin: 10.6091 xmax: -14.6251 ymax: 10.6546
## CRS:           +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
##                      date      dx      dy        dist  dt        R2n abs.angle   rel.angle no_trajet track_fid
## 1...1 2019-04-14 10:07:18 -0.0049 -0.0089 0.010159724 300 0.00000000 -2.074071          NA         1         1
## 2...2 2019-04-14 10:12:18 -0.0041 -0.0091 0.009980982 300 0.00010322 -1.994107  0.07996368         1         1
## 3...3 2019-04-14 10:17:18 -0.0035 -0.0096 0.010218121 300 0.00040500 -1.920403  0.07370363         1         1
## 4...4 2019-04-14 10:22:18 -0.0040 -0.0092 0.010031949 300 0.00091801 -1.980924 -0.06052022         1         1
## 5...5 2019-04-14 10:27:18 -0.0049 -0.0087 0.009984989 300 0.00162649 -2.083731 -0.10280767         1         1
## 6...6 2019-04-14 10:32:18 -0.0051 -0.0083 0.009741663 300 0.00252821 -2.121779 -0.03804747         1         1
##                 filename code_village code_engin code_pecheur duree                 geometry longitude latitude id obs
## 1...1 KPN_FMCl_0003_.gpx          KPN       FMCl            3   300 POINT (-14.6251 10.6546)  -14.6251  10.6546  1   1
## 2...2 KPN_FMCl_0003_.gpx          KPN       FMCl            3   300   POINT (-14.63 10.6457)  -14.6300  10.6457  2   1
## 3...3 KPN_FMCl_0003_.gpx          KPN       FMCl            3   300 POINT (-14.6341 10.6366)  -14.6341  10.6366  3   1
## 4...4 KPN_FMCl_0003_.gpx          KPN       FMCl            3   300  POINT (-14.6376 10.627)  -14.6376  10.6270  4   1
## 5...5 KPN_FMCl_0003_.gpx          KPN       FMCl            3   300 POINT (-14.6416 10.6178)  -14.6416  10.6178  5   1
## 6...6 KPN_FMCl_0003_.gpx          KPN       FMCl            3   300 POINT (-14.6465 10.6091)  -14.6465  10.6091  6   1
##       activity
## 1...1       UK
## 2...2       UK
## 3...3       UK
## 4...4       UK
## 5...5       UK
## 6...6       UK
R2_avec_obs %>% filter(code_engin==engin_encours) %>% ggplot()+geom_line(aes(x=date,y=dist))+
  geom_point(aes(x=date,dist,col = activity))  +
  facet_wrap(~no_trajet,scale='free')+ggtitle(paste("For gear ",engin_encours," Fishing event ~ distance (=speed)",sep=""))
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 3 rows containing missing values (`geom_point()`).

R2_avec_obs %>% filter(code_engin==engin_encours) %>% ggplot()+geom_line(aes(x=date,y=R2n))+
  geom_point(aes(x=date,R2n,col = activity))  +
  facet_wrap(~no_trajet,scale='free')+ggtitle(paste("For gear ",engin_encours," Relation observation et R2n",sep=''))

R2_avec_obs %>% filter(code_engin==engin_encours) %>% ggplot()+geom_line(aes(x=date,y=rel.angle))+
  geom_point(aes(x=date,rel.angle,col = activity))  +
  facet_wrap(~no_trajet,scale='free')+ggtitle(paste("For gear ",engin_encours,"  Relation observation et rel.angle",sep=''))
## Warning: Removed 2 rows containing missing values (`geom_line()`).
## Warning: Removed 29 rows containing missing values (`geom_point()`).

GLM model between fishing event and speed or other trajectory parameters

Model definition

observation<-'activity'

#On teste avec les 3 paramètres 
gear.glm<-model.traj.glm(filter(R2,code_engin==engin_encours),observation='activity',form='dist+abs.angle')
## [1] "observation2~dist+abs.angle"
summary(gear.glm)
## 
## Call:
## glm(formula = formula, data = traj2)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.14771  -0.12709  -0.10865  -0.08811   0.93608  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.127874   0.006191  20.654  < 2e-16 ***
## dist        -11.553706   1.923562  -6.006 2.09e-09 ***
## abs.angle    -0.008435   0.003123  -2.701  0.00694 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.09465995)
## 
##     Null deviance: 332.15  on 3472  degrees of freedom
## Residual deviance: 328.47  on 3470  degrees of freedom
##   (26 observations effacées parce que manquantes)
## AIC: 1673.5
## 
## Number of Fisher Scoring iterations: 2
summary(gear.glm)
## 
## Call:
## glm(formula = formula, data = traj2)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -0.14771  -0.12709  -0.10865  -0.08811   0.93608  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.127874   0.006191  20.654  < 2e-16 ***
## dist        -11.553706   1.923562  -6.006 2.09e-09 ***
## abs.angle    -0.008435   0.003123  -2.701  0.00694 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.09465995)
## 
##     Null deviance: 332.15  on 3472  degrees of freedom
## Residual deviance: 328.47  on 3470  degrees of freedom
##   (26 observations effacées parce que manquantes)
## AIC: 1673.5
## 
## Number of Fisher Scoring iterations: 2
plot(gear.glm)

Prédiction using GLM model

R2.pred<-glm.predict(filter(R2,code_engin==engin_encours),gear.glm,seuil=0.5)

p1<-ggplot(R2.pred)+geom_sf(aes(color=as.factor(predict.glm)),alpha=0.45,lwd=0.1)+
    scale_fill_viridis_c(option = "plasma", trans = "sqrt")
p2<-ggplot(filter(R2_avec_obs,code_engin==engin_encours))+geom_sf(aes(color=activity),alpha=0.45,lwd=0.1)+
    scale_fill_viridis_c(option = "plasma", trans = "sqrt")
ggarrange(p1,p2,ncol=1)

Obvisouly the Observed Fishing activities does not seem to be really efficient to produce a tight prediction

How to improve observed values

We can review observed fishing activities using visual interface: we define a new observation column : activity_plus

If we are not confident in using on board observation, the second way to have “observed” fishing event is to qualify trajectory in active or not active segments. To do that, we will use the silicoTraj function that will help you to qualify fishing event on a set of trajectories (the first 5 trajectories here selected in the longuest trajectories)

The dataframe traj_to_observe contains the list of 5 (top_n(5)) track number.

We will use the same function silicoTraj in two different modes sequencially : * First in mode speed where i qualify GPS ping using speed * Second one where i can qualify GPS ping directly on a map

R2 %>% st_drop_geometry() %>% dplyr::filter(code_engin==engin_encours) %>% dplyr::group_by(no_trajet) %>% 
  dplyr::summarize(nb_positions=n()) %>% dplyr::arrange(desc(nb_positions)) %>% dplyr::top_n(5) -> traj_to_observe

R2 %>% filter(no_trajet %in% traj_to_observe$no_trajet) %>% 
  ggplot()+geom_sf(aes(color=as.factor(no_trajet)))+facet_wrap(~no_trajet)

#I create the new column and set values to Unknown 
R2$activity_plus<-'UK'

for (i in traj_to_observe$no_trajet)
{
  R2<-track_replace(R2,silicoTraj(filter(R2,no_trajet==i),mode='speed'))
}


for (i in traj_to_observe$no_trajet)
{
  R2<-track_replace(R2,silicoTraj(filter(R2,no_trajet==i),mode='map'))
}

With these new set of data i try to improve the model

gear.glm.plus<-model.traj.glm(filter(R2,code_engin==engin_encours),observation="activity_plus",form= "dist+abs.angle")
## [1] "observation2~dist+abs.angle"
summary(gear.glm.plus)
## 
## Call:
## glm(formula = formula, data = traj2)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.04042  -0.03348   0.01827   0.08290   0.45586  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.035e+00  2.207e-03  469.17   <2e-16 ***
## dist        -1.259e+02  6.622e-01 -190.05   <2e-16 ***
## abs.angle   -1.869e-02  1.097e-03  -17.03   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.02639266)
## 
##     Null deviance: 1155.92  on 7563  degrees of freedom
## Residual deviance:  199.55  on 7561  degrees of freedom
##   (75 observations effacées parce que manquantes)
## AIC: -6021.9
## 
## Number of Fisher Scoring iterations: 2
plot(gear.glm.plus)

R2.pred.plus<-glm.predict(filter(R2,code_engin==engin_encours),gear.glm.plus,seuil=0.5)


p1<-ggplot(R2.pred.plus)+geom_sf(aes(color=as.factor(predict.glm)),alpha=0.45,lwd=1)+
    scale_fill_viridis_c(option = "plasma", trans = "sqrt")
p2<-ggplot(filter(R2.pred.plus,code_engin==engin_encours))+geom_sf(aes(color=activity_plus),alpha=0.45,lwd=1)+
    scale_fill_viridis_c(option = "plasma", trans = "sqrt")
ggarrange(p1,p2,ncol=1)

Conclusion, on board observers data will be trashed and we can easily have better qualification using less energy, less funds but more expert knowledge in fishing events.

Add a new covariable : the number of position in the same spatial and temporal windows

R2test_retour<-all.add.nb.point(R2.pred.plus,r=2000,temp_windows=20)
## Joining, by = c("no_trajet", "id")
ggplot(filter(R2test_retour,no_trajet==traj_to_observe$no_trajet[1]))+geom_sf(aes(color=(circle2000)))

ggplot(filter(R2test_retour,no_trajet==traj_to_observe$no_trajet[1]))+geom_point(aes(x=id,y=circle2000,color=activity_plus))

Using this new variable, we try to improve the model

gear.glm.plus.nb<-model.traj.glm(filter(R2test_retour,code_engin==engin_encours),observation="activity_plus",form= "dist+abs.angle+circle2000")
## [1] "observation2~dist+abs.angle+circle2000"
summary(gear.glm.plus)
## 
## Call:
## glm(formula = formula, data = traj2)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.04042  -0.03348   0.01827   0.08290   0.45586  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.035e+00  2.207e-03  469.17   <2e-16 ***
## dist        -1.259e+02  6.622e-01 -190.05   <2e-16 ***
## abs.angle   -1.869e-02  1.097e-03  -17.03   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.02639266)
## 
##     Null deviance: 1155.92  on 7563  degrees of freedom
## Residual deviance:  199.55  on 7561  degrees of freedom
##   (75 observations effacées parce que manquantes)
## AIC: -6021.9
## 
## Number of Fisher Scoring iterations: 2
summary(gear.glm.plus.nb)
## 
## Call:
## glm(formula = formula, data = traj2)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.01301  -0.03416   0.01471   0.07579   0.47465  
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  9.386e-01  1.321e-02  71.041  < 2e-16 ***
## dist        -1.163e+02  1.449e+00 -80.237  < 2e-16 ***
## abs.angle   -1.918e-02  1.095e-03 -17.504  < 2e-16 ***
## circle2000   2.550e-03  3.435e-04   7.421 1.29e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 0.02620525)
## 
##     Null deviance: 1155.92  on 7563  degrees of freedom
## Residual deviance:  198.11  on 7560  degrees of freedom
##   (75 observations effacées parce que manquantes)
## AIC: -6074.8
## 
## Number of Fisher Scoring iterations: 2
plot(gear.glm.plus.nb)

R2.pred.plus.nb<-glm.predict(filter(R2test_retour,code_engin==engin_encours),gear.glm.plus.nb,seuil=0.5) %>% filter(!is.na(predict.glm))


p1<-ggplot(R2.pred.plus.nb)+geom_sf(aes(color=as.factor(predict.glm)),alpha=0.45,lwd=1)+
    scale_fill_viridis_c(option = "plasma", trans = "sqrt")
p2<-ggplot(filter(R2.pred.plus.nb,code_engin==engin_encours))+geom_sf(aes(color=activity_plus),alpha=0.45,lwd=1)+
    scale_fill_viridis_c(option = "plasma", trans = "sqrt")
ggarrange(p1,p2,ncol=1)

The AIC of his model (same set of data but new variable in the formula) is better (-4300 instead of -3800). We could thus consider that adding thist circle2000 co variable imporve the model

Random Forest prediction

Random forest is another algorythm which is able to predict fishing event using observed values. Methodology is describe in folowing article :

[Estimating fishing effort in small-scale fisheries using GPS tracking data and random forests] (https://dx.doi.org/10.1016/j.ecolind.2020.107321).

model.traj.RF and RF.predict function are similar to model.traj.glm and gm.predict function.

#To select some 
form <-"dist+abs.angle+circle2000"
mod.RF<-model.traj.RF(traj=R2.pred.plus.nb ,observation='activity_plus',form=form)
## [1] "activity_plus~dist+abs.angle+circle2000"
R2.pred.plus.nb.RF<-RF.predict(traj=R2.pred.plus.nb,mod.RF)
## Joining, by = c("date", "dx", "dy", "dist", "dt", "R2n", "abs.angle", "rel.angle", "no_trajet", "track_fid", "filename",
## "code_village", "code_engin", "code_pecheur", "duree", "longitude", "latitude", "id", "obs", "activity", "activity_plus",
## "predict.glm.int", "predict.glm", "circle2000")
ggplot(R2.pred.plus.nb.RF)+geom_sf(aes(color=predict.RF),lwd=0.1)+geom_sf(data=st_crop(fond,st_bbox(R2.pred.plus.nb.RF)))+
  scale_color_manual(values = c("active" = "lightgreen","UK"="orange")) +
  ggtitle("Map of predict activities using RF method")

ggplot(R2.pred.plus.nb.RF)+geom_sf(aes(color=predict.glm),lwd=0.1)+geom_sf(data=st_crop(fond,st_bbox(R2.pred.plus.nb.RF)))+
  scale_color_manual(values = c("active" = "lightgreen","UK"="orange")) +
  ggtitle("Map of predict activities using GLM method")

Quality of models

Quality of models could be describe using SENSITIVITY, SPECIFICITY AND ACCURACY indicators.

Quality indicators specifications

The qualidy.ind function of the GPSMonitoring package aims to calculate those indicators using three parameters : the dataset, the name of the column for observed value and the column name for predict values.

qual.RF<-quality.ind(R2.pred.plus.nb.RF,col.activity='activity_plus',col.predict='predict.RF')

qual.glm<-quality.ind(R2.pred.plus.nb.RF,col.activity='activity_plus',col.predict='predict.glm')

qual.glm[[1]] %>% mutate(model='glm') %>% select(model,accuracy,sensitivity,specificity) %>% pivot_longer(cols=!model)->glm.ind
qual.RF[[1]] %>% mutate(model='RF') %>% select(model,accuracy,sensitivity,specificity) %>% pivot_longer(cols=!model)->RF.ind

rbind(glm.ind,RF.ind) %>% 
  ggplot()+geom_bar(aes(x=name,y=value,fill=model),stat='identity',position='dodge')+
  ggtitle("Quality comparaison between two last models GLM/RF")

Regular grid end fishing event

engin_encours<-'FMCy'

#Remettre avec un shapefile dans data
#fond<-st_read(con,query="select st_intersection(st_buffer(geom,0),ST_GeomFromText('POLYGON((-18 13,-13 13,-13 7.94,-18 7.94,-18 13))',4326)) as geom 
#                      from communes_uni")
data(grid)
ggplot(grid)+geom_sf(color=NA)+geom_sf(data=R2.pred.plus.nb.RF)

 st_join(grid,filter(R2.pred.plus.nb.RF,predict.RF=='active'),left=FALSE) %>% group_by(code_engin,id.x) %>% dplyr::summarize(total=n()) %>% 
   ggplot()+ geom_sf(aes(fill=total),lwd=0)+geom_sf(data=fond)+facet_wrap(~code_engin)+ scale_fill_continuous(trans = 'reverse')+
  xlim(as.numeric(bb$xmin),as.numeric(bb$xmax))+ ylim(as.numeric(bb$ymin),as.numeric(bb$ymax))+
   ggtitle("Map of fishing event density")
## `summarise()` has grouped output by 'code_engin'. You can override using the `.groups` argument.

st_join(grid,R2.pred.plus.nb.RF,left=FALSE) %>% group_by(code_engin,id.x) %>% dplyr::summarize(total=n()) %>% 
   ggplot()+ geom_sf(aes(fill=total),lwd=0)+geom_sf(data=fond)+facet_wrap(~code_engin)+ scale_fill_continuous(trans = 'reverse')+
  xlim(as.numeric(bb$xmin),as.numeric(bb$xmax))+ ylim(as.numeric(bb$ymin),as.numeric(bb$ymax))+
  ggtitle("Map of GPS data density")
## `summarise()` has grouped output by 'code_engin'. You can override using the `.groups` argument.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.